An Active Learning Method for Speaker Identity Annotation in Audio Recordings
نویسندگان
چکیده
Given that manual annotation of speech is an expensive and long process, we attempt in this paper to assist an annotator to perform a speaker diarization. This assistance takes place in an annotation background for a large amount of archives. We propose a method which decreases the intervention number of a human. This method corrects a diarization by taking into account the human interventions. The experiment is done using French broadcast TV shows drawn from ANR-REPERE evaluation campaign. Our method is mainly evaluated in terms of KSR (Keystroke Saving Rate), and we reduce the number of actions needed to correct a speaker diarization output by 6.8% in absolute value.
منابع مشابه
AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking
Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the groundtruth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, vi...
متن کاملThe Ta2 Database - a Multi-modal Database from Home Entertainment
This paper presents a new database containing highdefinition audio and video recordings in a rather unconstrained video-conferencing-like environment. The database consists of recordings of people sitting around a table in two separate rooms communicating and playing online games with each other. Extensive annotation of head positions, voice activity and word transcription has been performed on...
متن کاملA Multi-Modal Database from Home Entertainment
This paper presents a new database containing highdefinition audio and video recordings in a rather unconstrained video-conferencing-like environment. The database consists of recordings of people sitting around a table in two separate rooms communicating and playing online games with each other. Extensive annotation of head positions, voice activity and word transcription has been performed on...
متن کاملVariable print quality
In the literature, much research work has been done in the area of speaker verification. The developments include: different types of speaker verification techniques, methods for feature extraction, measures for telephone channel compensation, system robustness etc. In contrast, the problem of acoustic feature selection for speaker verification has been relatively neglected. Hence our aim is to...
متن کاملThe SRI Speech-Based Collaborative Learning Corpus
We introduce the SRI speech-based collaborative learning corpus, a novel collection designed for the investigation and measurement of how students collaborate together in small groups. This is a multi-speaker corpus containing high-quality audio recordings of middle school students working in groups of three to solve mathematical problems. Each student was recorded via a head-mounted noise-canc...
متن کامل